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Abstract. We explore the relation between the techniques of statistical mechanics and 
information theory for assessing the performance of channel coding. We base our study on 
a framework developed by Gallager in IEEE Trans. Inform. Theory 11, 3 (1965), where the 
minimum decoding error probability is upper-bounded by an average of a generalized Chernoff 's 
bound over a code ensemble. We show that the resulting bound in the framework can be 
(y^ ' directly assessed by the replica method, which has been developed in statistical mechanics 

O . of disordered systems, whereas in Gallager's original methodology further replacement by 

another bound utilizing Jensen's inequality is necessary. Our approach associates a seemingly 
ad hoc restriction with respect to an adjustable parameter for optimizing the bound with a 
. phase transition between two replica symmetric solutions, and can improve the accuracy of 

QO ' performance assessments of general code ensembles including low density parity check codes, 

, although its mathematical justification is still open. 
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p 
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0O , 1. Introduction 

In the last few decades, much attention has been paid to the similarities between statistical 
mechanics and information theory. In general, inference or search problems that arise in research 
on communication, inference, learning, combinatorics and other information theory fields can 
be treated by regarding the system as a virtual spin system subject to disordered interactions 
PQ El E] . In this way, problems in information theory have been successfully analyzed utilizing 
methods developed in statistical mechanics [H El El \7\ [8] , and vice versa [9} ITU]. 

This research trend has shown that the similarities between the two fields are not limited 
to the structure of problems but also apply to analysis techniques. However, because the 
development histories of the two frameworks have been relatively independent, there are still 
barriers which may hinder further expansion and deepening of this promising interdisciplinary 
research field. In order to overcome possible obstacles, it is of great importance to investigate the 
methodological relations between the two fields. This article is written under this motivation. 
More precisely, we explore the similarities and differences between the techniques of statistical 
mechanics and information theory in analyzing channel coding (or error correcting codes). 

This article is organized as follows. In sections 2 and 3, we briefly review a standard framework 
of (classical) channel coding and a conventional methodology for assessing its performance, which 
was developed by Gallager in [llj . Sections 4 and 5 are the main parts of the current article. In 
section 4, we reconsider the channel coding problem by applying the replica method developed 
in statistical mechanics. Using the replica method makes it possible to avoid applying Jensen's 



inequality, which is required in the original methodology. This offers a novel interpretation of the 
origin of a superficially ad hoc restriction with respect to an adjustable parameter for tightening 
the upper-bound of the minimum decoding error probability that appears in the conventional 
approach. Applying the replica method does not change the assessed performance, though it 
can improve the accuracy of the performance assessment for general code ensembles, including 
low-density parity-check codes, as shown in section 5. The final section, section 6, is devoted to 
a summary and discussion. 

2. Framework of channel coding 

Consider a message m £ {1,2, ...,2 } transmitted to a receiver through a classical noisy 
channel. For this purpose, m is, in general, mapped to a codeword of N dimension 
Xm = (i m i, x m 2, ■ ■ ■ , x m ]\f) G {0,1}^ prior to the transmission. The mapping of m — > x m 
[m = 1,2, ...,2 ) can equivalently be expressed as C = {x±, x 2 , ■ ■ ■ , x 2 k} and is termed a 
channel coding or simply a (channel) code. 

The receiver must infer the original message m from the received degraded codeword of 
N dimension y = (2/1,2/2, • • • >2/iv)- F° r simplicity, we assume a memoryless channel with the 
degradation process modeled by a conditional probability P(y\x m ) = Y\i=i P(ui\ x mi)- We 
further assume that the message m is encoded by a method of source coding such that it is 
equally generated with a probability of 2~ K , which is preferred for enhancing communication 
performance 0. Under these assumptions, Bayesian theory indicates that for a given code C, the 
maximum likelihood (ML) decoding 

m(y) = argmax {P(y\x s )} , (1) 

sE{l,2,...,2 K } 

minimizes the probability of decoding error 

P e (C) = 2- K ^P(y\x m )A ML (m,y), (2) 
m,y 

where argmaxj- • •} denotes the argument that maximizes • • • and Aml^j y) = 1 if the original 
message m is not correctly retrieved by equation ([T]) for a given y and Aml(™, y) = otherwise. 
In the following, we address the problem of assessing how small a P e (C) is achievable by selecting 
the optimal C among a given code ensemble. 

3. Conventional scheme for analyzing channel coding 

3.1. Generalized Chernoff's bound 

As Aml(^)?/) depends on m and y in a highly nonlinear manner, direct evaluation of equation 
([2]) is difficult. In order to avoid this difficulty, several techniques for upper-bounding this 
function have been developed in conventional information theory \\.\ \ I12| . [T5] . 
The inequality 




which holds for VA > and \/p > 0, is key for this purpose. This is validated as follows. 
The right hand side is non-negative and therefore satisfies the inequality if Aml^;?/) = 0. If 

1 In the conventional argument of information theory, channel coding is examined independently of source coding 
without assuming a prior distribution of messages. However, we here assume that messages are uniformly 
distributed a priori as a result of source coding in order to emphasize the optimality of the maximum-likelihood 
decoding. 



AMh( m ,y) = 1) there exists at least one message s for which P(y\x s ) > P{y\x m ). This means 
that for such a message, (P(y\x s ) j P{y\x m )) > 1 holds in the summation of the right hand 
side of equation (|3|) since A > ensures the fraction is greater than unity and therefore the 
summation is greater than unity as all terms are non-negative, p > also ensures the inequality 
is valid. 

Substituting equation (|3|) into equation ([2j) yields a generalized Chernoff 's bound 



P e(C )< 2 - E P to |, m )( E («) A ) , (4) 



m,y Vs^m 



which holds for VA > and \/p > 0. This indicates that the accuracy of the upper-bound can 
be improved by minimizing the right hand side with respect to these parameters. 



3.2. Ensemble average as an upper-bound for the minimum 

Unfortunately, direct minimization of the right hand side of equation (JH) is non-trivial due to 
the complicated dependence on C. However, the expression can still be useful for assessing the 
minimum error probability among all possible codes, P e = min Cg r a ^ coc j eS }{-fe(C)}, for classical 
channels. 

2 K 

For this purpose, we introduce an ensemble of all codes Q{C) = \\ s= iQ{x s ), where Q{x) 
is an identical distribution for generating codewords x\,x 2 , ■ ■ ■ ,x 2 k independently. Averaging 
equation ([!]) with respect to Q(C) gives an upper-bound of P e as 

P e < pJc)< E C(c) (a-'E^M (e (S4)1 ) 

Ce{ all codes} \ W™ V {Vl m)) J J 

/ 2* \ _ 2K ( Y 

= rK E EEwW'l E IIgo=.) E p (fi^) A . ( 5 ) 

y \m=l X m J C\X m s^m \s^m J 

due to the fact that the minimum value over a given ensemble is always smaller than the average 
over the ensemble. Here, 7TT represents the average over a code ensemble Q(C) and C\x m denotes 
a subset of C = {xi, x 2 , ■ ■ ■ , x 2 k } from which only x m is excluded. 



3.3. Jensen's inequality and random coding exponent 

Equation ([5]) is still difficult to assess for large K because the right hand side involves the frac- 



tional moment of a sum of exponentially many terms J2c\x m Y\.1^mQ( x s) ^Es^m P(y\ x vX 
the direct and rigorous evaluation of which requires an exponentially large computational cost 
even while the code ensemble is factorizable with respect to codewords. Jensen's inequality 

2* / \ P / 2* 

E 11^) E p (^) A ^ E Y[Q(xs)Y. p (y\ x 

C\Xm s^m ys^m J \C\Xm s^m s^m 

= (2 K -iy[Y J Q{x)P(y\x)^ <2f K [Y J Q{x)P{y\x)^ , (6) 




which holds for < p < 1, is a standard technique of information theory to overcome this 
difficulty. Plugging this into equation ©, in conjunction with an additional restriction p < 1, 



we obtain the expression 



P P 



y V x J V x 



(7) 



J2x Qi^Piylx) 1 Xp i s utilized and 



(0 < p < 1), where 2~ K Z m =i Ex m Q{x m )P(y\x r 
the trivial case p = is included. 

For any given < p < 1, the upper-bound of equation ([7]) is generally minimized 
by A = 1/(1 + p), as assumed in Gallager's paper [TT]. The computational difficulty for 
assessing equation ([7]) is resolved for memoryless channels P(y\x) = Y\i=i P(Ul\ x l) by assuming 
factorizable distributions Q(x) = Y\iLiQ( x i)- This assumption naturally indicates that the 
upper-bound depends exponentially on the code length N as P e < exp [— N (—pR + Eo(p, Q))], 
where R = K/N and 



Eo(P,Q) = 



(8) 



are often termed the code rate and Gallager function, respectively. This means that if N is 
sufficiently large and the random coding exponent 



E r (R)= max {-pi?ln2 + E Q (p, Q)} , 
0<p<l,Q 



(9) 



is positive for a given R, there exists a code with a decoding error probability smaller than an 
arbitrary positive number. For a fixed Q(x), Eq(p,Q) is a convex upward function satisfying 
E (p = 0,Q) = and 



d_ 

dp 



E Q (p,Q) 



p=0 



(10) 



y,;i 



where I(Q) represents the mutual information between x and y (in bits). This implies that the 
critical rate R c below which E r {R) becomes positive is given by p = as 



R c = max{/(Q)}, 
Q 



(ii) 



which agrees with the definition of the channel capacity 

As R is reduced from R c , the value of p that optimizes the right hand side of equation ([9]) 
increases and reaches p = 1 at a certain rate Rb- Below Rb, equation ([9]) is always optimized 
at the boundary p = 1. Figure [T] shows an example of E r {R) for the binary symmetric channel 
(BSC), which is characterized by a crossover rate of < p < 1 as P(1|0) = P(0|1) = p and 
P(l|l) = P(0\0) = 1-p for binary alphabets x,y £ {0,1}. 

E r {R) characterizes an upper-bound of a typical decoding error probability of randomly 
constructed codes. However, surprisingly enough, it is known that for certain classes of channels, 
E r (R) represents the performance of the best codes at the level of exponent for a relatively high 
code rate region R > R a , which contains R = Rb, since E r (R) agrees with the exponent of a 
lower bound of the best possible code [15J. This is far from trivial because the restriction p < 1, 
which governs E r (R) of R < Rb, is introduced in an ad hoc manner when employing Jensen's 
inequality in the above methodology. 
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Figure 1. Random coding exponent E r (R) for BSC for a crossover rate p = 0.1. E r (R) (solid 
curve) becomes positive for R < R c (— 0.531). The functional form of E r (R) for R < Rb(— 0.189) 
differs from that for Rb < R < R c . The broken curve represents the value of the upper-bound 
exponent that is maximized without the restriction p < 1. 



4. Performance assessment by the replica method 

4-1- Expanding the upper-bound for p = 1, 2, . . . 

In order to clarify the origin of the superficially artificial restriction p < 1, we evaluate the 
exponent without using Jensen's inequality. For this purpose, we assess the right hand side of 
equation ([5]) analytically, continuing the expressions obtained for p = 1, 2, . . . to p £ M. This is 
often termed the replica method [161 ITT] . 

For the current problem, the first step of the replica method is to evaluate the expression 



{s a } p a=1 T^m\X T / 

£ W(i u i 2 ,...,i P )f[(TQ(x)P(y\x)A ,(12) 
(ii,ta,...,tp) *=1 V x ' 



analytically for p = 1,2,..., where 5(x,y) = 1 for x = y and vanishes otherwise, and 
W(i\,i2, ■ ■ ■ ,i P ) is the number of ways of partitioning p replica messages s 1 ,^ 2 ,... 



states (out of r = 1, 2, . . . 
by p. Obviously, W(ii,i2, 



2 K except for r 



m 



, s p to i\ 

by one, to i 2 states by two, . . . and to i p states 



unless Y2t=i ^ = P- ^ ^ s wor th noting that the expression 



of the right hand side is valid only for p = 1,2, 



4-2. Saddle point assessment under the replica symmetric ansatz 

Exactly evaluating equation (|12j) is difficult for large K = NR. However, in many systems, 
quantities of this kind scale exponentially with respect to N, which implies that the exponent 
characterizing the exponential dependence can be accurately evaluated by the "saddle point 
method" with respect to the partition of p, (ii,«2, • • • ,i p ), under an appropriate assumption of 
the symmetry underlying the objective system in the limit of N — * oo. The replica symmetry, 
for which equation (|12|) is invariant under any permutation of the replica indices a = 1, 2, . . . , p, 
is critical for the current evaluation. This implies that it is natural to assume that, for large 
N, the final expression of equation (|12|) is dominated by a single term possessing the same 
symmetry, which yields the following two types of replica symmetric (RS) solutions: 

• RSI: Dominated by {i\, 12, ••• , i p ) = (p, 0, 0), giving 

E II Q( x >) (j2 p (y\ x ^ - Mp,o,...,o)(eq(*)^w a ) 

C\X m s+m \s^m I \ X / 



2 N » R [Y,Q{*)P{V\*) X \ • (13) 



x 



RS2: Dominated by (i\, 12, ■ ■ ■ , ip) = (0, 0, . . . , 1), giving 



E II qm E p (y\ x ^ x - moa-a)[52q{x)p(v\x) 

C\X m s^m \s^m J \ X / 

~ 2 ivi? E^( a; ) p (^i a; ) Ap - ( 14 ) 



x 



Plugging these into the final expression of equation ([5]), in conjunction with P(y\x) 
IL=i P(ui\ x i) an d Q( x ) = Y\i=i Q(xi), gives the exponents 



E RS1 (p,X,Q,R) = -pRln2-ln 



EE«^1 E^) p ( 



|x) A 



(15) 



and 



ErS2(p, A, Q,R) = -Rhx2 — In 



l-Ap 



Y,Q(x)P(y\x) xp 



(16) 



where the suffixes RSI and RS2 correspond to equations (fT3l) and {Hj), respectively, as two 
candidates of the exponent E(p, X,Q, R) for upper-bounding the minimum decoding error 
probability as P e < exp [—NE(p, A, Q, R)}. 



4-3. Phase transition between RS solutions: origin of the restriction p<l 
Although we have so far assumed that p is a natural number, both the functional forms of the 
saddle point solutions, (fT5|) and (fT6|) . can be defined over p G R. Therefore, we analytically 
continue these expressions from (5 = 1,2,... topGR, and select the relevant solution for each 
set of (p, A, Q, R) in order to obtain the correct upper-bound exponent E(p, A, Q, R). This is the 
second step of the replica method. 

For p = 1,2, .. . and sufficiently large N, this can be carried out by selecting the solution 
of the lesser exponent value. Unfortunately, as yet a mathematically justified general guideline 



for selection of the relevant solution for p < 1 has not been determined. Such a guideline 
is necessary for determining the channel capacity by assessment at p = 0. However, there is 
an empirical criterion for this purpose, which is indicated by the analysis of exactly solvable 
models [18]. In the current case, this means that for fixed X,Q and R we should choose the 
solution for which the partial derivative with respect to p at p = 1, (d/dp)E R s\(p, A, Q, R)\ p=1 
or (d/dp)E R s2(p, A, Q, R)\ p=1 , is lesser, as the relevant solution for p < 1. This criterion 
implies that E R $i(p, A, Q, R) should be chosen to provide the tightest bound E rep n ca (R) = 
maxo< Pi o<A,Q {E(p, A, Q, R)} for relatively large R, which yields the expression 



-^replica (^) 



max {E RS i(p, A, Q,R)} 

0<p,0<X,Q 



max < 

0<p,Q 



-pR In 2 - In 



W^Q(*)P(y|*)^ 



(17) 



As R is reduced from R = R c , below which equation (|17p becomes positive, the value of p 
that maximizes the right hand side of equation (I17p increases from p = 0, keeping the relation 
A = 1/(1 + p) at the maximum point. When R reaches Rb, the optimal value of p becomes unity 
and A = 1/2, for which 



^-£rsi(p, A, Q,R) 



(p,\,R)=(l,l/2,R b ) 
pR b In 2 - In 



^E RS2 (p,X,Q,R) 



y \ x 



1+P 



X) !+P 



(p,X,R)=(l,l/2,R b ) 

= 0. (18) 



This implies that for R < R b , (d/dp)E RS2 (p, A, Q, R)\ p=1 < (d/dp)E RS i(p, A, Q, R)\ p=1 
holds when the condition for a maximum is satisfied. Therefore, we should not select 
E R s\(p, X,Q, R), but rather E R s2(p, A, Q, R) for assessing the tightest bound E rcp n ca (R) = 
max < Pi o<A,Q {E(p, A, Q, R)} for R < R b , which yields 



^replica (^) 



i E ^S2{p, A, Q, R)} 



max 

1<P,0<A,C 



-#^2 -In 



^ ^Q^P^Ix) 1 -^ i^2Q(x)P(y\: 



\\p 



max < —R In 2 — In 



y \ x J 



(19) 



In the second line, any choice of (p, A) that satisfies \p = 1/2 and p > 1 optimizes the exponent. 

Although the style of the derivation seems somewhat different from that of the conventional 
approach, the exponents obtained by equations (|17|) and (|19p are identical to those assessed 
using equation Q. Therefore, -E re piica(-R) = E r (R) holds, implying that no improvement is 
gained by the replica method in the analysis of the ensemble of all codes. 

Nevertheless, our approach is still useful for clarifying the origin of the seemingly artificial 
restriction p < 1 in the conventional scheme. The above analysis indicates that there is no such 
restriction as long as the upper-bound of equation ([5]) is directly evaluated. Instead, what is the 
most relevant is the breaking of the analyticity with respect to p of the upper-bound exponent 
E(p, X,Q, R), which can be interpreted as a phase transition between the two types of replica 
symmetric solutions E R si(p, A, Q, R) and E R s 2 (p, A, Q, R) in the terminology of physics. As a 
consequence, we have to appropriately switch the functional forms of the objective function in 



order to correctly obtain the optimized exponent. However, this procedure, in practice, can 
be completely simulated by optimizing a single function in conjunction with introducing an 
additional restriction p < 1, which can be summarized by a conventional formula of the random 
coding exponent, namely equation ([9]). 

Of course, it must be kept in mind that the mathematical validity of our methodology is 
still open while the known correct results are reproduced. Although applying the saddle point 
assessment is a major reason for the weakening of mathematical rigor, the most significant issue 
in the current context is mathematical justification of the empirical criterion at p = 1 to select 
the appropriate solution for p < 1 when multiple saddle point solutions exist. Accumulated 
knowledge about error exponents of various codes in information theory [TTH [2"U1 [2"T| [2"2~] may 
be of assistance for solving this issue. 

Although we have applied the replica method to an upper-bound following the conventional 
framework in order to clarify the relation to an information theory method, it can be utilized 
to directly assess the minimum possible decoding error probability. For a region of lower R, 
there still exists a gap between the lower- and upper-bounds of the error exponents of the best 
possible code. An analysis based on the replica method indicates that the lower-bound of the 
exponent, which corresponds to the upper-bound of the decoding error probability, agrees with 
the correct solution [23J. 

5. Analysis of low-density parity-check codes 

5.1. Definition of an LDPC code ensemble 

Although a novel interpretation is obtained, our approach does not update known results in the 
analysis of the ensemble of all codes. However, this is not the case in general; the replica method 
usually offers a smaller upper-bound than conventional schemes for general code ensembles. We 
will show this for an ensemble of low-density parity-check (LDPC) codes. 

A (k,j) LDPC code is defined by selecting N — K parity checks composed of k components, 



codeword of length N, x = (xi) G {0,1} , where l\,l2, ■ ■ ■ ,lk = 1,2, ...,N and © denotes 
addition over the binary field. There are several ways to define an LDPC code ensemble. For 
analytical convenience, we here focus on an ensemble constructed by uniformly selecting N — K 
ordered combinations of k different indices li, I2, ■ ■ ■ , Ik, (hh---h), for parity checks, so that 
each component index of codewords /(= 1,2,... ,N) appears j times in the total set of parity 
checks. A code C constructed in this way is specified by a set of binary variables c = {c/j 1 j a ...j fc \}, 
where c/yj m = 1 if the combination (I1I2 ■ ■ - h) is used for a parity check and c/^ 2 = 



For simplicity, we assume symmetric channels, where we can assume that the sent message 
m is encoded into the null codeword x = 0. Under this assumption, the generalized Chernoff 's 
bound dH) for an LDPC code is expressed as 




combinations of indices for characterizing a binary 



otherwise. 



p 



P e (C) < ^P(ylO) 1 -^ Yl Z(*\c)P(v\x) 



A 



(20) 



y \x^o 



where 



i(x\c)= Yl { 1 - c (hi 2 ...i k )+ c (hi2...i k )^(xi 1 exi 2 e ...ex lk ,o)) 

(lih—h) 




returns unity if x satisfies all the parity checks and vanishes otherwise, screening only codewords 
in the summation over x in the right hand side of equation (I20p . 



5.2. Performance assessment by the replica method 

Unlike the random code ensemble explored in the previous section, a statistical dependence 
arises among codewords in an LDPC code. This yields atypically bad codes, the minimum 
distance of which is of the order of unity with a probability of algebraic dependence on N. The 
contribution of such atypical codes causes the average of the decoding error probability over a 
naive LDPC code ensemble to decay algebraically with respect to N, indicating that the error 
exponent vanishes even for a sufficiently small rate R [23]. However, we can reduce the fraction 
of the bad codes to as small as required by removing short cycles in the parity check dependence 
by utilizing certain feasible algorithms [23]. This implies that, in practice, the performance of 
the LDPC code ensembles can be characterized by analysis with respect to the typical codes 
utilizing the saddle point method as shown below [26]. 

In order to employ the replica method, we assess the average of the right hand side of equation 
([20]) with respect to the LDPC code ensemble 



l 



N(k,j) 



II 6 E c (lhh-h)'3 | ' 
1=1 \(l 2 l 3 -lk) 



(22) 



where J\f(k, j) = £ c ]J l=1 5 (Y,(i 2 i 3 ...i k ) c (ihh-i k )> 3j stand s for the number of (k, j) LDPC codes. 
For p = 1,2,... and sufficiently large N, evaluating this using the saddle method, substituting 
with P(y\x) = Y\i = i P(yi\xi), gives an upper-bound for the average decoding error probability 
over an ensemble of typical LDPC codes from which atypically bad codes are expurgated as 
PjC) < exp [-iY£ LD pc(/o, A, R)\ , where 



Extr 



N 



k-l 



k\ 



+ In 



e n^)n^®^©---©^o) 

b lt b 2 ,...,b k t=1 a=1 

^p(yio) 1 -^ [Y,x^yflP(y\x a ) x 



x 



0=1 



E£( b M b ) 

b 



3+3 ln 



UN) 



l-i/jfc 



((^-l)!) 1/fc 




(23) 



b = (b 1 , b 2 , . . . , b p ) G {0, 1} P and x = (x 1 , x 2 , . . . , x p ) G {0, 1} P . Extrx denotes the operation of 
extremization with respect to X, which corresponds to the saddle point assessment of a certain 
complex integral and therefore does not necessarily mean maximization or minimization. An 
outline of the derivation is shown in Appendix A. 

An RS solution which is relevant for < p < 1 corresponding to RSI in the previous section 
is obtained under the RS ansatz 



X(b) 



duir(u) J^J 

0=1 

r+l P 
(b) = q dun(u)H 

0=1 



l + u{-l) 1 



l + u{-l) 1 



(24) 
(25) 



where q and q are normalization variables that constrain the respective variational functions 
7r(u) and tt(u) to be distributions over [—1,1], making it possible to analytically continue the 
expression (j23|) from p = 1, 2, . . . to p G M. Carrying out partial extremization with respect to 



q and q yields an analytically continued RS upper-bound exponent 



Extr < — In 

7r,7r I k 



Y]_du t TT(u t ) 



111 



t=l 

+1 'J 



-i 



j In 



duTr(u)ir(u) 



^t=l \x=0,l^t=l 
1 + uu^ '' 



(26) 



where the functional extremization Extr^ ^ {• • •} can be performed numerically in a feasible time 
by Monte Carlo methods in practice [27| , 



5.5. Comparison of lower-bound estimates of error threshold 

When the noise level is sufficiently small and the code length TV" is sufficiently large, there exists 
at least one (k, j) LDPC code with a decoding error probability smaller than an arbitrary positive 
number. The maximum value of such noise levels is sometimes termed the error threshold. 

Equation (I26p can be utilized to assess a lower-bound of the error threshold. TableHJshows the 
lower-bounds obtained by maximizing this equation with respect to p > and A > for several 
sets of (k,j). Estimates obtained by the conventional schemes utilizing Jensen's inequality, 
which in the current case are determined by an upper-bound exponent 



^LDPC (Pi \ R) 



Extr 1. pj- In 
u,u I k 



1+u 



k\l 



+ In 
- Pj In 



y \x=0,l ^ ' 



1 + UU 



(27) 



are also provided for comparison. 

Table [T] indicates that, in general, the lower-bounds estimated by the replica method are not 
smaller than those of the conventional schemes. This implies that unlike the case of the ensemble 
of all codes, employing Jensen's inequality can relax an upper-bound for general code ensembles 
and therefore there may be room for improvement in results obtained by conventional schemes 
based on this inequality. 



6. Summary and discussion 

In summary, we have explored the relation between statistical mechanics and information theory 
methods for assessing performance of channel coding, based on a framework developed by 
Gallager [TT]. An average of a generalized Chernoff 's bound for probability of decoding error over 
a given code ensemble can be directly evaluated by the replica method of statistical mechanics, 
while Jensen's inequality must be applied in a conventional information theory approach. The 
direct evaluation of the average associated a switch of two analytic functions in the random 
coding exponent known in information theory with a phase transition between two replica 
symmetric solutions obtained by the replica method. Better lower-bounds of the error threshold 
were obtained for ensembles of LDPC codes under the assumption that the replica method 
produces the correct results. This may motivate an improvement in the accuracy of performance 
assessment, refining the conventional methodologies. 



R 


(i, *0 


Jensen 1 


Jensen 2 


replica 


Shannon 


1/2 


(3,6) 


0.0678 


0.0915 


0.0998 


0.109 


2/5 


(3,5) 


0.115 


0.129 


0.136 


0.145 


1/3 


(4,6) 


0.1705 


0.1709 


0.173 


0.174 


1/3 


(2,3) 





0.0670 


0.0670 


0.174 


1/2 


(2,4) 





0.0286 


0.0286 


0.109 



Table 1. Lower-bound estimates of the error threshold of BSC. In columns "Jensen 1", 
"Jensen 2" and "replica", the estimates represent the critical crossover rates p c , below which 
the maximized values of equation (I26p or (I27p are positive. In the evaluation, the exponents are 
maximized with respect to two parameters p > and A > for "Jensen 2" and "replica" while 
a single parameter maximization with respect to p > 0, keeping A = 1/(1 + p), is performed for 
"Jensen 1" . "Shannon" represents the channel capacity for a given code rate R. 



A characteristic feature of the methods developed in statistical mechanics is the employment 
of the saddle point assessment utilizing a certain symmetry underlying the objective system, 
which, in some cases, makes it possible to accurately analyze macroscopic properties of large 
systems even when there are statistical correlations or constraints among system components. 
Such approaches may also be useful for analyzing codes of quantum information, for which, in 
many cases, there arise non-trivial correlations among codewords for the purpose of dealing with 
noncommutativity of operators |28j . 
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Appendix A. Outline of derivation of equation (1231) 

Equation (i23j) is obtained by averaging the right hand side of equation (f20l) with respect to 
the (k,j) LDPC code ensemble (I22p . For this assessment, we first evaluate the normalization 
constant M(k,j) utilizing the identity 



(A.l) 



where i = y 7 — 1 and <f dZ denotes the contour integral along a closed curve sur- 
rounding the origin on the complex plane. Plugging this expression into M(k,j) = 



Ec Ui=i S (E(i 2 i 3 ...i k ) C{ii 2 i 3 ...i k ),j) yields 
(2' 

(2 .V/n^^ o ' +i) exp 
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-O'+i) 
^ exp 



, exp 



y~] z h z h ...zi l 

(hhh—lk) 



l=i 



(A.2) 



Here, in the third to fifth lines we have omitted irrelevant higher order terms since 
they do not affect the following saddle point assessment. Inserting the identity 1 = 



N- 1 Jdq 6(YZi z i- N 9>) = (27TiV)-i f dq /+™ dq Q exp [g (^£1 ^ ~ Nq ) \ into this 
expression makes it possible to analytically integrate equation (|A.2j) with respect to Zi (I = 
1,2, ... , N). For large N, the most dominant contribution to the resulting integral with respect 
to qo and % can be evaluated by the saddle point method as 



1 f N k ~ 

— ]nM(k,j) ~ Extr<^— - 

N q ,q k\ 



?o<7o + In 



j + In 



(jNy-j/ k 

((k - l)!)i/ fc j! 



,(A.3) 



where the saddle point is given as go = ((A;— 1) !) 1 / fc j 1 /^ and go = ((k— 1)!) 1 / k (jN) 1 1 / k . 
The average of the right hand side of equation (|2U|) for p = 1,2,... can be evaluated in 

a similar manner. For this, we expand (Y^ X9 LQ^(x\c)P(y\x) x ^j and take the average with 

respect to c, utilizing the LDPC code ensemble (I22p . For each fixed set of a; 1 , a; 2 ,... ,x p , we 
obtain the expression 
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, (A.4) 



where we have introduced the dummy variables 64 = (6j, 6 2 , . . . , &£) (t = 1, 2, . . . , k) as 



n <5« © x? 2 © . . . © X f k , 0) = ]r n n ^ , &?) n ) ^ ® 62 ® • • • ® 6 *« °)> ( a - 5 ) 

6i,& 2 ,..A Vo=u=i 



0=1 



a=i/ 



in order to decouple , , . . . , of the left hand side. Inserting the identity 



1 = N~ 



-2i' 
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(A.6) 



where = (xf,xf, . . . , x^) (i = 1, 2, . . . , N), into equation (|A.4|) allows integration with respect 
to Zi (I = 1,2, . . . , N) to be performed analytically. The resulting expression enables us to 
take summations with respect to x^ (J = 1,2,... ,N) independently in assessing the average, 
which yields identical contributions for I = 1, 2, . . . , N and leads to the saddle point evaluation 
of equation 
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